Gene Expression Profile as a Predictor of Seizure Liability

Analysis platforms to predict drug-induced seizure liability at an early phase of drug development would improve safety and reduce attrition and the high cost of drug development. We hypothesized that a drug-induced in vitro transcriptomics signature predicts its ictogenicity. We exposed rat cortical neuronal cultures to non-toxic concentrations of 34 compounds for 24 h; 11 were known to be ictogenic (tool compounds), 13 were associated with a high number of seizure-related adverse event reports in the clinical FDA Adverse Event Reporting System (FAERS) database and systematic literature search (FAERS-positive compounds), and 10 were known to be non-ictogenic (FAERS-negative compounds). The drug-induced gene expression profile was assessed from RNA-sequencing data. Transcriptomics profiles induced by the tool, FAERS-positive and FAERS-negative compounds, were compared using bioinformatics and machine learning. Of the 13 FAERS-positive compounds, 11 induced significant differential gene expression; 10 of the 11 showed an overall high similarity to the profile of at least one tool compound, correctly predicting the ictogenicity. Alikeness-% based on the number of the same differentially expressed genes correctly categorized 85%, the Gene Set Enrichment Analysis score correctly categorized 73%, and the machine-learning approach correctly categorized 91% of the FAERS-positive compounds with reported seizure liability currently in clinical use. Our data suggest that the drug-induced gene expression profile could be used as a predictive biomarker for seizure liability.


Introduction
Seizures related to drug use are severe adverse reactions that can prevent the drug from entering the market or lead to its withdrawal [1,2]. According to recent analysis, the cost of developing a new drug and achieving marketing approval is approximately $1.2 billion [3]. Therefore, the early detection of factors that might decrease the risk of later attrition would have a major influence on drug development.
Several in vitro and in vivo analysis platforms with medium throughput have been used for the early detection of the seizure liability (ictogenicity) of novel drug candidates [4]. These platforms include brain slice and multi-electrode array analysis of drug-induced epileptiform activity in rodent cortical neurons or slices, or human induced pluripotent cells [5][6][7][8][9][10][11][12][13][14], and the analysis of motor behavior generated by zebrafish larvae exposed to convulsant drugs [15,16]. Despite the promising data and increasingly stronger predictive values achieved by applying machine-learning analysis approaches, the predictive accuracy remains limited or has not yet been vigorously validated in different laboratories [17].
Recent in vivo analyses of the brain tissue from rodent models of epilepsy or from surgically removed human epileptic tissue revealed aberrant transcriptional regulation, acquired channelopathies, acquired synaptopathies, and neuroinflammation as mechanisms contributing to seizure generation [18,19]. Similarly, seizures induced in vivo by

Number of Differentially Expressed (DE) Genes in Rat Cortical Neuronal Cell Cultures Is Dose-Dependent
First, we performed a dose-optimization study to demonstrate that a 24-h expo of rat primary cortical cell cultures to various compounds will induce gene expressio non-toxic doses. Of the five compounds studied, four (all but aminophylline) induc dose-dependent increase in the number of DE genes when both the false discovery (FDR) < 0.05 and the log2 transform of the absolute value of the fold-change (FC) > were used as criteria ( Table 1). The greatest dose-dependency was found after expo of the cultures to pentylenetetrazol, as 1000 µ M did not induce any DE genes whe 20,000 µ M induced 851 DE genes.    Of the 10 tool compounds, the highest number of DE genes was induced by amoxapine (2659), followed by SNC80 (2308) and pentylenetetrazol (1014). The numbers of DE genes induced by the remaining tool compounds ranged from 21 to 1014.
Of the 13 FAERS-positive compounds, the highest number of DE genes (3695) was induced by clozapine. The number of DE genes induced by the remaining FAERS-positive compounds ranged from 3 to 827.
Of the 11 FAERS-negative compounds, the highest number of DE genes was induced by imiquimod (2214), then rosiglitazone (1827). The number of DE genes induced by the remaining FAERS-negative compounds ranged from 7 to 1686.

Similarities in Gene Expression between the Tool, FAERS-Positive, and FAERS-Negative Compounds
Venn diagrams show 2263 common DE genes between the tool, FAERS-positive, and FAERS-negative compounds when both the upregulated and downregulated genes were included in the analysis (Figure 2A). If only the upregulated DE genes were included, 303 DE genes were common to all the compound categories ( Figure 2B). If only the downregulated DE genes were included, 1541 DE genes were common to all the compound categories ( Figure 2C).
The number of the same DE genes induced by each of the 34 compounds is crosstabulated in Table 3.
Venn diagrams show 2263 common DE genes between the tool, FAERS-positive, and FAERS-negative compounds when both the upregulated and downregulated genes were included in the analysis (Figure 2A). If only the upregulated DE genes were included, 303 DE genes were common to all the compound categories ( Figure 2B). If only the downregulated DE genes were included, 1541 DE genes were common to all the compound categories ( Figure 2C). The number of the same DE genes induced by each of the 34 compounds is crosstabulated in Table 3.

Pathways Enriched by Tool, FAERS-Positive, and FAERS-Negative Compounds
An enrichment analysis of DE genes found a total of 736 enriched Reactome pathways induced by the 34 compounds (Table S1).
The 10 tool compounds induced a total of 333 pathways. Interestingly, all the tool compounds, except pilocarpine, resulted in enriched pathways.

Machine-Learning Using Normalized RNA-Sequencing (RNA-Seq) Read Counts Differentiates Tool Compounds from Other Compounds, but Not FAERS-Positive or FAERS-Negative Compounds
One-vs.-rest machine learning (ML) classification with regularized logistic regression (LR) and support vectors machines (SVM) was applied to assess the separability of compounds in each group (tool, FAERS-positive, FAERS-negative) from the compounds in the remaining groups based on normalized read counts. Table 4 presents the classification performance metrics from the cross-validation. A moderate separation (LR area under the curve (AUC) 0.76 and SVM AUC 0.78) of the tool compounds from the FAERS-positive or FAERS-negative compounds was possible using the normalized RNA-seq counts. The Both the FAERS-positive and FAERS-negative compounds also had enrichment in "apoptosis", "synapse assembly", "axonogenesis", "axon extension", "synaptic transmission", and "neurotransmitter transport" GO terms.

Machine-Learning Using Normalized RNA-Sequencing (RNA-Seq) Read Counts Differentiates Tool Compounds from Other Compounds, but Not FAERS-Positive or FAERS-Negative Compounds
One-vs.-rest machine learning (ML) classification with regularized logistic regression (LR) and support vectors machines (SVM) was applied to assess the separability of compounds in each group (tool, FAERS-positive, FAERS-negative) from the compounds in the remaining groups based on normalized read counts. Table 4 presents the classification performance metrics from the cross-validation. A moderate separation (LR area under the curve (AUC) 0.76 and SVM AUC 0.78) of the tool compounds from the FAERS-positive or FAERS-negative compounds was possible using the normalized RNA-seq counts. The performance in separating FAERS-positive and FAERS-negative compounds from the rest, however, was poor (i.e., FAERS-positive from tool and FAERS-negative; FAERS-negative from tool and FAERS-positive), implying the lack of a clear pattern separating the groups. This lack of clear separation is supported by the UMAP reduction in Figure 5A, where no clear clustering by any of the compound categories was observed.

Machine-Learning Separated FAERS-Positive from FAERS-Negative Compounds Utilizing Differential Gene Expression Shared with Tool Compounds
Next, we assessed the number of tool compounds exhibiting any similarity to the FAERS-positive or FAERS-negative compounds to differentiate the FAERS-positive from the FAERS-negative compounds. Similarity was defined as having one or more DE genes with a positive FC in common with a given tool compound. The leave-one-out cross-validation area under the receiver-operator-characteristic curve for the approach was 0.77 (sensitivity of 0.77, specificity of 1.0, accuracy of 0.87). The p-value for accuracy above a no information rate of 0.57 was 0.002. The approach correctly classified 10 of the 13 FAERS-positive compounds. The exceptions included paroxetine, diphenhydramine, and venlafaxine, which were incorrectly classified as FAERS-negative compounds. Figure 5B shows the UMAP reduction for vectors, consisting of the number of DE genes with a positive FC shared by each FAERS-positive and-negative compound with different tool compounds. The separation between the FAERS-positive and FAERS-negative compounds is significantly clearer in this features space than in the space of normalized counts ( Figure 5A). Table 4. One-vs.-rest classification performance per compound group. Task denotes the compound groups attempted to separate and classifier denotes the ML model utilized in the classification. While tool compounds can be separated moderately well from the FAERS-positive (+) and FAERS-negative (−) compounds, the poor classification performance for the remaining groups implies the lack of a clear pattern distinguishing the FAERS− and FAERS+ compounds. performance in separating FAERS-positive and FAERS-negative compounds from the rest, however, was poor (i.e., FAERS-positive from tool and FAERS-negative; FAERS-negative from tool and FAERS-positive), implying the lack of a clear pattern separating the groups. This lack of clear separation is supported by the UMAP reduction in Figure 5A, where no clear clustering by any of the compound categories was observed.

Alikeness-% and Gene Set Enrichment Analysis (GSEA) Score
Alikeness-%. The alikeness-% was generated to compare the number of common DE genes between the FAERS-positive and tool compounds. The alikeness-% sum for clozapine was 221.5, which was the highest sum among all 13 FAERS-positive compounds (Table 5). Mirtazapine, aminophylline, and bupropion had comparable alikeness-% sums, ranging from 50 to 70. Maprotiline, theophylline, amitriptyline, temozolomide, tramadol, and venlafaxine hydrochloride had alikeness-% sums ranging from 5 to 50. As expected by the low number of DE genes, isoniazid, paroxetine, and diphenhydramine had alikeness-% sums <1, indicating that, similar to tool compounds, they did not induce gene expression changes. GSEA score. Clozapine had a GSEA score of 28.8, which was the highest of all the FAERS-positive compounds ( Table 6). Mirtazapine had the next highest score, 20.1. Bupropion and amitriptyline had a GSEA score within a comparable range, from 7 to 12. The theophylline, maprotiline, and venlafaxine hydrochloride scores ranged from 5 to 3. Interestingly, clozapine, mirtazapine, and bupropion were the top three FAERS-positive compounds with a high alikeness-% sum (Table 5). No GSEA score could be calculated for tramadol, temozolomide, isoniazid, aminophylline, or diphenhydramine.
Taken together. Alikeness-% correctly classified 9 of the 13 FAERS-positive compounds when the cut-off for the alikeness-% sum was set to >10%. The GSEA score correctly classified 8 of the 13 FAERS-positive compounds (Table 7). Amitriptyline, aminophylline, bupropion, clozapine, maprotiline, mirtazapine, and theophylline (7 of the 13 compounds) were classified correctly by both scoring systems. Alikeness-% also correctly classified temozolomide and tramadol, which were not detected by the GSEA score. The GSEA score detected venlafaxine hydrochloride, which had an alikeness-% of 7.4%. Both scoring systems failed to detect diphenhydramine (three DE genes) and paroxetine (no DE genes). Machine learning failed to correctly classify diphenhydramine and paroxetine as well as venlafaxine hydrochloride (151 DE genes). Importantly, the falsely classified compounds had a very small number or no DE genes ( Table 2). Isoniazid with only 32 DE genes, however, was correctly classified into the FAERS-positive category by the machine-learning approach but not by the alikeness-% or the GSEA score. If the two compounds (diphenhydramine with three DE genes, paroxetine with no DE genes) were excluded from the FAERS-positive category originally containing 13 compounds, the alikeness-% correctly classified 9 of 11 (isoniazid and venlafaxine incorrectly categorized), the GSEA score correctly classified 8 of 11 (isoniazid, temozolomide, and tramadol incorrectly categorized) and the machine-learning approach correctly classified 10 of 11 (venlafaxine incorrectly categorized) of the FAERS-positive compounds. Table 6. Sum of the normalized enrichment score (NES) base values for the FEARS-positive compounds (GSEA score) and the number of enriched gene sets in the enrichment analysis with FEARSpositive drugs and tool compounds. For each tool compound, a gene set was generated from the upregulated and downregulated genes, which were then compared to a ranked list of each FEARS-positive compound.  Table 7. Summary of correct categorization of FAERS-positive compounds by different analysis approaches. Diphenhydramine and paroxetine were excluded from the summary due to the low number of differentially expressed (DE) genes. The higher the rank, the greater the alikeness-% or GSEA score.

Discussion
The present study aimed to develop a transcriptomics-based in vitro assay for the early detection of drug-induced seizure liability. Rat cortical neuronal cultures were exposed for 24 h to non-toxic concentrations of compounds with known, suspected, or no seizure liability. The RNA-seq data were analyzed using bioinformatics and machine-learning approaches. We had four major findings. We hypothesized that the gene expression profile of the FAERS-positive compounds would be comparable to that of the tool compounds.
(1) Each of the tool compounds with a known ictogenic effect induced a compound-specific transcriptomics signature. We were unable to identify any "common ictogenic signature".
(2) Also, the FAERS-negative compounds with no or very low seizure liability induced a remarkable gene expression profile, which was considered "noise" in the analysis platform.

Gene Expression in Rat Cortical Cell Cultures by Tool, FAERS-Positive, and FAERS-Negative Compounds Was Not Associated with Neurotoxicity
The FAERS-positive and FAERS-negative compounds and even some of the tool compounds (pilocarpine, pentylenetetrazol, 4-aminopyridine, chlorpromazine, amoxapine, and donezepil) are administered clinically to treat various diseases in the brain or peripheral tissues or for diagnostic purposes. Consequently, defining the optimal concentration for in vivo gene expression studies in neuronal cell cultures of this heterogeneous list of compounds was challenging [30,31]. First, the dose selection was guided by the information available on the therapeutic plasma concentrations. Second, we tested the neurotoxicity of each compound around the therapeutic concentration range and selected the highest tolerated concentration in our assay, as our preliminary studies with 4-aminopyridine, bupropion, kainic acid, and pentylenetetrazol indicated that the drug-induced gene expression was highly concentration-dependent.
Of the 34 compounds tested, only maprotiline (30 µM), miconazole (30 and 100 µM), chlorpromazine (100 µM), and kainic acid (100 µM) induced neurotoxicity in rat cortical neuronal cell cultures, as indicated by the increased lactate dehydrogenase (LDH) levels in the conditioned medium. Consistent with the present data, kainic acid was reported to reduce the cell viability in mouse cortical neuronal cultures at concentrations between 10 and 1000 µM [32]. Furthermore, maprotiline reduced the viability of neuro-2a cell cultures at concentrations between 5 and 100 µM [33]. Based on the toxicity information, the drug concentrations were adjusted to the highest non-toxic levels to minimize the lack of induced expression, which was needed for the design of the assay.

The Tool, FAERS-Positive, and FAERS-Negative Compounds Induce Category-Specific Gene Expression, but Pathway and GO Analyses Revealed No Specific Markers for Seizure Liability
First, we compared the gene expression generated by all the compounds included in the tool, FAERS-positive, or FAERS-negative categories. Our aim was to identify an "ictogenic gene expression signature" that would be induced by all or a majority of the tool compounds, and then to use the signature to categorize the FAERS-positive compounds as "potentially" ictogenic and the FAERS-negative compounds as "non-ictogenic" based on their gene expression profile.
Venn diagrams indicated that the three compound categories had an overlap of over 2000 DE genes. The tool compounds, however, also had DE genes specific to that compound category, suggesting that gene expression might indeed be able to differentiate the risk of ictogenicity.
Next, we investigated whether the gene expression profiles of the FAERS-positive and FAERS negative compounds had pathways or GO terms comparable to those of the tool compounds, which would reveal the presence or lack of their seizure-inducing properties.
Contrary to our expectations, the analysis of GO terms in the tool compound category revealed no seizure-related terms. For example, 4-aminopyridine, amoxapine, bicuculline, chlorpromazine, donepezil, kainic acid, picrotoxin, and SNC80 indicated "aorta morphogenesis" and "collagen metabolic process" terms. The FAERS-positive drugs, however, indicated several GO terms that could indicate the risk of seizures, e.g., aminophylline, amitriptyline, clozapine, maprotiline, mirtazapine, theophylline, tramadol, and venlafaxine were linked to the terms "calcium ion regulated exocytosis of neurotransmitter" and "regulation of membrane potential". Furthermore, the FAERS-negative compounds azelastine, imiquimod, miconazole, minoxidil, ospemifene, rosiglitazone, and valdecoxib were linked to the terms "regulation of neurotransmitter levels" and "regulation of postsynaptic membrane potential", which could indicate the molecular pathways related to seizure liability. In general, the enriched GO terms indicated that the most common GO terms in the tool, FAERS-positive, and FAERS-negative categories were "cell cycle process" and "immune response". Interestingly, both of these processes relate to the acute phase of status epilepticus, epileptogenesis, and epilepsy [38][39][40].
Taken together, even though the gene expression profiles of the tool, FAERS-positive, and FAERS-negative compounds were linked to several enriched pathways and GO terms potentially signaling for seizure liability, they were not specific to any compound category.

From a Common to Drug-Specific Ictogenic Signature
As the pathway analysis indicated compound-specific rather than compound-common gene expression profiles, we next compared the gene expression of each FAERS-positive compound to the 11 tool compounds one-by-one using three different strategies.
Alikeness-% indicated the percentage of similar DE genes between the two compounds. The sum of the alikeness-% informed us of the similarity to all 11 tool compounds, and was the highest for clozapine, which showed overlapping gene expression with 9 of the 11 tool compounds. Importantly, the basic assumption was that the compound had to induce DE gene expression. Contrary to our expectations, 2 of the 13 FAERS-positive compounds induced no (paroxetine) or a very low number (three, diphenhydramine) of DE genes. After their exclusion, 9 of the 11 remaining FAERS-positive drugs had a sum alikeness-% ≥10%, indicating a DE gene expression similar to that of one or more tool compounds with known seizure liability. The 32 DE genes induced by isoniazid with the lowest sum alikeness-% of 0.3 had almost no similarity to any of the tool compounds, suggesting different/novel mechanisms for its seizure liability. The 151 DE genes induced by venlafaxine had a sum alikeness-% of 7.4, of which the highest single alikeness-% (2.4) was with SNC80.
The GSEA score takes into account the similarity in both the upregulated and downregulated genes between the FAERS-positive and tool compounds. The analysis revealed that 8 of the 13 FAERS-positive compounds showed similar gene regulation to one or more tool compounds. After the exclusion of the two low-inducing FAERS-positive compounds, the GSEA analysis found similarities between 8 of the 11 FAERS-positive compounds and the tool compounds. Isoniazid (32 DE genes), temozolomide (21 DE genes), and tramadol (100 DE genes) were undetected by the GSEA analysis.
The machine-learning approach correctly classified 10 of the 13 FAERS-positive compounds from the FAERS-negative compounds. After the exclusion of the two low-inducing FAERS-positive compounds, machine learning correctly separated 10 of the 11 FAERS positive compounds from the FAERS-negative compounds. The approach did not differentiate venlafaxine as a FAERS-positive compound.
Finally, the rankings of the compounds by the different approaches were comparable. For example, both the alikeness-% and the GSEA score were highest for clozapine and mirtazapine. Venlafaxine was not found by the alikeness-% or machine-learning approaches and was ranked lowest by the GSEA score.

Rat Primary Cortical Cell Cultures
Primary cortical cell cultures were used for the cytotoxicity assay, compound dose optimization assay, and the assessment of compound-induced gene expression. The cell culturing pipeline is summarized in Figure 6.

Production of Embryos
Adult female Sprague-Dawley rats of breeding age (Envigo, Horst, Limburg, The Netherlands) were injected with the luteinizing hormone-releasing hormone (LHRH) agonist (#L4513, Sigma-Aldrich, St. Louis, MO, USA; single injection of 40 µg, i.p.) to ensure an optimal menstrual cycle for coupling. Five days after the LHRH injection, the females were coupled overnight with the male rats ( Figure 6), which was counted as embryotic day 0 (E0). On E18, the rats were anesthetized by the inhalation of 5% isoflurane, the neck was dislocated, and the embryos were collected to dissect the cerebral cortex for the cell cultures.

Rat Primary Cortical Cell Cultures
Primary cortical cell cultures were used for the cytotoxicity assay, compound dose optimization assay, and the assessment of compound-induced gene expression. The cell culturing pipeline is summarized in Figure 6.

Production of Embryos
Adult female Sprague-Dawley rats of breeding age (Envigo, Horst, Limburg, The Netherlands) were injected with the luteinizing hormone-releasing hormone (LHRH) agonist (#L4513, Sigma-Aldrich, St. Louis, MO, USA ; single injection of 40 µ g, i.p.) to ensure an optimal menstrual cycle for coupling. Five days after the LHRH injection, the females were coupled overnight with the male rats ( Figure 6), which was counted as embryotic day 0 (E0). On E18, the rats were anesthetized by the inhalation of 5% isoflurane, the neck was dislocated, and the embryos were collected to dissect the cerebral cortex for the cell cultures.
For the cytotoxicity experiment, four cell culture batches were generated. In batch #1, three females produced 38 embryos; in batch #2, three females produced 29 embryos; in batch #3, four females produced 52 embryos; and in batch #4, four females produced 49 embryos. The cerebral cortex was dissected and plated on 96-well plates.
For the compound dose optimization experiment, three female rats produced 45 embryos. The cortical cells were plated on 24-well plates.
For the compound-induced gene expression experiment, four female rats produced 49 embryos. The cortical cells were plated on 24-well plates. For the cytotoxicity experiment, four cell culture batches were generated. In batch #1, three females produced 38 embryos; in batch #2, three females produced 29 embryos; in batch #3, four females produced 52 embryos; and in batch #4, four females produced 49 embryos. The cerebral cortex was dissected and plated on 96-well plates.
For the compound dose optimization experiment, three female rats produced 45 embryos. The cortical cells were plated on 24-well plates.
For the compound-induced gene expression experiment, four female rats produced 49 embryos. The cortical cells were plated on 24-well plates.

Compound Selection
The 11 tool compounds were selected based on their known ictogenic properties and different mechanisms of action. The 13 FAERS-positive test compounds included those frequently associated with seizures based on the FAERS database and the VigiAccess adverse event records [29] followed by the disproportionality analysis [43]. The 10 FAERSnegative compounds exhibited very low or no linkage with seizures in the analysis.

Assessment of Cytotoxicity
To determine the highest tolerated (non-toxic) concentration of the 11 tool, 13 FAERSpositive, and 10 FAERSnegative compounds, the cortical cell cultures were incubated with each drug for 24 h and the amount of LDH was measured as an indicator of cytotoxicity. Four drug concentrations and four replicate wells were analyzed. The range of drug concentrations included the known therapeutic plasma concentration in humans, which was determined by a literature review (Table 8). Table 8. Concentration range assessed (µM), compound-induced cytotoxicity in the rat cortical neuronal cultures, the highest tolerated or tested concentration, and the concentration selected for the compound-induced gene expression experiment of the 11 tool, 13 FAERS-positive, and 10 FAERSnegative compounds. The vendor and product number of the compounds purchased are shown in the right-hand columns. The concentrations for cytotoxicity testing were selected based on a literature review of known therapeutic plasma concentration in humans. * Statistical analysis indicated that a 1000 µM concentration of theophylline showed a trend towards cytotoxicity. Consequently, a lower concentration was used in the mRNA-seq experiment.
The LDH assay was performed using the Cytotoxicity Detection Kit (#11644793001, Roche, Basel, Switzerland) as described previously [44]. Briefly, the culture medium was collected from the 96-well cell culture plate, pipetted to a 96-well microplate, and then centrifuged (300 g, 3 min, room temperature) to pellet the possible cell remains to the bottom. The supernatant was then diluted 1:4 with distilled water and the reaction mixture was prepared by mixing a dye solution and a catalyst at a ratio of 1:45. Then, 100 µL of the diluted supernatant and 100 µL of the reaction mixture were pipetted onto the 96-well plate and incubated in the dark for 30 min at room temperature. Finally, the absorbance was measured at 490 nm with a microplate reader (Wallac Victor 2 , Perkin Elmer, CA, USA).
The four wells incubated without any treatment or only with vehicle (DMSO 0.1%) served as a "low control". The four wells treated with Triton X-100 (#93418, Sigma), leading to a maximal LDH release, served as a "high control". To calculate the magnitude of cytotoxicity % in each treated well, the following formula was used: The statistical significance of the difference between the compound-and vehicleinduced cytotoxicity was assessed with the Kruskal-Wallis test, followed by a post hoc analysis using the Mann-Whitney U test in GraphPad Prism software (version 9.3.1).

Effect of Compound Dose on Gene Expression and the Compound-Induced Gene Expression Profile
Next, we determined the gene expression profile of each of the 34 compounds in the rat cortical neuronal cell cultures. After a 24-h drug exposure, the cells were collected from the 24-well plates, the total RNA was extracted, the RNA-sequencing library was constructed, and the RNA was sequenced using the Illumina platform (see Section 4.4.3 for the sequencing library construction and sequencing details).

Extraction of Total RNA
The RNA was extracted with an RNeasy mini kit (#74104; Qiagen, Hilden, Germany) according to the manufacturer's instructions. Briefly, after a 24-h incubation with the drugs, the cell culture media were removed from the wells (four replicates) and replaced with 350 µL of RLT buffer containing 1% of mercaptoethanol (#M6250, Sigma). The cells were scraped into the RLT buffer with a cell scraper (#83.1832, Sarstedt, Newton, NC, USA) and transferred to a 1.5-Ml centrifuge tube. The cells were homogenized by vortexing for 1 min. Then, 350 µL of 70% ethanol was added to the lysate. The lysate was then transferred to a column, centrifuged, and washed with RW1 buffer. For the DNA removal, DNAse I (#79256, Qiagen) and an RDD buffer mixture were added to the column and the mixture was incubated at room temperature for 15 min. The column was washed with RW1 and an RPE buffer according to the manufacturer's protocol. The remaining RNA was eluted to 50 µL of RNase-free water, aliquoted, and stored at −70 • C for quality control and RNA-seq.

Quality Control of Extracted RNA
The RNA concentration and RNA integrity number (RIN) were measured with a 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA) using the RNA 6000 Nano Kit (#5067-1511, Agilent).
In the compound dose optimization experiment, the median concentration of the extracted RNA was 102.8 ng/µL (range 39.4-153.3 ng/µL) and the median RIN was 8.8 (range 7.7-9.6).
In the compound-induced gene expression experiment, the median concentration of the extracted RNA was 37.4 ng/µL (range 15. 8-153.4) and the median RIN was 9.3 (range 7.7-10).

mRNA-seq Library and Sequencing
In the compound dose optimization experiment, 300 ng of the total RNA was used to prepare the sequencing library using a TruSeq Stranded mRNA HT Kit (#15031047, Illumina, San Diego, CA, USA). The sequencing was conducted using a NovaSeq 6000 instrument (Illumina) with 2 × 50 bp paired end reads targeting 15-18 M raw reads per replicate.
In the compound-induced gene expression experiment, 100 ng of the total RNA was used to prepare the sequencing library using an Illumina Stranded mRNA Preparation kit (#1000000124518, Illumina).
The sequencing was performed using the NovaSeq 6000 instrument (Illumina) with 2 × 100 bp paired end reads, aiming at 27-33 M raw reads per replicate. The details of the mRNA-seq data quality control, mapping, and identification of the DE genes have been previously described [45]. Briefly, the quality control of the mRNA-seq reads was performed using FastQC and MultiQC to define the quality score and the number of produced reads [46,47]. The reads were mapped to the Ensemble RN6 genome (Rnor 6.0.99) with STAR software (version 2.7.3a) [48]. In the dose-effect experiment, the median of the mapped reads was 17.4 M (range 14.2-21.1 M). In the gene expression experiment, the median of the mapped reads was 34.9 M (range 8.7-69.7 M).

Identification of DE Genes
The DE genes were identified using the DEseq2 [49] R package (R version 4.1.0). The Benjamini-Hochberg false discovery rate (FDR) was used to calculate the adjusted p-value. The differences in the gene expression between the drug-exposed and vehicle-exposed cultures were considered significant if the FDR was <0.05 and the log2(|FC|) was >1.5. Normalized read counts in fragments per kilobase of per million mapped reads (FPKM) were extracted with the fpkm-function in DeSeq2 for further use in machine-learning analysis.

Comparison of the Number of DE Genes between the Tool, FAERS-Positive, and FAERS-Negative Compounds
Venn diagrams were generated using the VennDiagram R package in R to assess the similarity in the gene expression profiles induced by the tool, FAERS-positive, and FAERS-negative compounds [50]. A gene was classified into the tool, FAERS-negative, or FAERS-positive category if its differential expression was induced by at least one compound in a given compound category.
A cross-tabulation table, showing the number of DE genes between all the 34 tested compounds was generated to assess the similarity of the drug-induced gene expression between the compound categories (Table 3).

Identification of Enriched Pathways
The ClusterProfiler R-package was used to identify enriched pathways, compare identified pathways between the tool, FAERS-positive, and FAERS-negative compounds, and to visualize the results [51]. The pathway identification was performed using pathways in Reactome in the ClusterProfiler [52]. A pathway was considered enriched if the FDR was <0.05.

Gene Ontology Analysis of DE Genes
To identify the enriched GO terms of the DE genes for each compound, the GSEA was performed using the Molecular Signatures Database (MSigDB). The rat IDs of the DE genes were converted to their human homologues in the Ensembl Biomart web interface [53]. To generate the rank list, the DE genes were ranked according to the log2(|FC|), which were then loaded into the GSEA software [54]. In the GSEA, enriched GO terms were obtained for each ranked list by performing a GSEA pre-ranked analysis using the biologic processes gene sets (BP-subset of C5 category, version 7.5) with 1000 permutations, and no collapse option [55].
To identify the enriched GO terms (FDR < 0.05) for each tool, FAERS-positive and FAERS-negative compound, the enriched terms were analyzed with Cytoscape (version 3.9, FDR < 0.05) [56]. Then, to visualize the GO terms related to each compound category, word clouds were generated in R using the word clouds package [57,58]. To simplify the wording in the word clouds, the GO terms were abbreviated. That is, the prefix GOBP was removed, and then interleukin (IL), G protein-coupled receptor (GPCR), transforming growth factor (TGF), tumor necrosis factor (TNF), extracellular matrix (ECM), reactive oxygen species (ROS), and endoplasmic reticulum (ER) were replaced with abbreviations. The GO terms "apoptotic process" or "cell-mediated apoptotic process" were shortened to "apoptosis". "Regulation of and positive/negative regulation" was removed from all the GO terms. The GO terms with "cell-type-mediated cytokine production" were shortened to "cytokine production" and the GO terms with "cell-type-mediated immune response" were shortened to "immune response"; "metabolic process" was shortened to "metabolism" and "different cell cycle phases" was shortened to "cell cycle process". 4.5.6. Alikeness-% and GSEA Score The gene expression "alikeness-%" was generated to determine the likelihood of a given compound having ictogenic properties based on the DE genes. To reduce noise in the alikeness-% (i.e., changes in the gene expression profile that do not contribute to ictogenicity), the DE genes of the FAERS-negative compounds, which were also regulated by the tool or FAERS-positive compounds, were removed ( Figure 2). Next, the DE genes with the same direction of regulation by both the drug of interest and the tool compound (e.g., upregulated by both a given FAERS-positive compound and any tool compound) were identified. Then, the alikeness-% for a given FAERS-positive compound with any of the tool compounds was calculated as follows: (number of the same DE genes with a tool compound/number of all DE genes induced by the tool compound) × 100%. This was repeated for all the combinations of FAERS-positive and tool compounds ( Table 5).
As another approach to identify ictogenic properties in the gene expression profile, we generated the GSEA score. As for the generation of the alikeness-%, the DE genes of the FAERS-negative compounds only were removed from the expression profiles of the FAERS-positive and tool compounds. Then, the GSEA rank lists were generated from the gene expression data available from the tool compounds by ranking the genes in order according to the p-value of the mRNA differential expression. The upregulated genes were assigned a positive rank number and the downregulated genes a negative rank number. Then, two gene sets were generated for all the FAERS-positive compounds: one for the upregulated DE genes and another for the downregulated DE genes. Finally, to assess enrichment, all the gene sets were compared to all the ranked lists with the fgsea R-package, and significant normalized enrichment scores (adjusted p-value < 0.05) were included with the GSEA scores ( Prior to classification, the low-expressing genes were filtered by removing genes with fewer than five normalized counts (FPKM) in at least 50% of the samples. This reduced the number of genes from 32,623 to 10,326. The classification was performed using a pipeline, including three stages: (1) feature pre-processing, (2) feature selection, and (3) classification. In the feature pre-processing step, the counts from each sample were rank-transformed [60] and the per-gene median of the ranks over the four replicates was calculated for each compound. The medians were standardized to a zero mean and unit variance over the dataset. In the feature selection step, the genes were filtered by selecting the k genes with the highest F-value (the ratio of the variance of group means over the mean of withingroup variances). The value for k and whether to omit the filtering step were selected during the pipeline hyperparameter optimization. In the classification step, the compound was classified using logistic regression (LR) with l1 or l2 regularization [61] and support vector machines (SVM) [62] with linear or radial basis kernels. The mode and strength of regularization for the LR and SVM kernel along with the regularization strength were selected during the hyperparameter optimization.
The classification performance was evaluated using nested stratified cross-validation, where all the compounds (i.e., tool, FAERS-positive, and FAERS-negative compounds) were repeatedly split into testing and training sets. The testing set contained a single compound from one of the three compound categories (target group) and three to four randomly sampled compounds from the entire (non-target) compound set. The training set contained the remaining compounds that were not included in the testing set. In the outer cross-validation loop, each testing set was classified utilizing a classifier trained on the compounds in the training set. The predicted classes and actual classes were pooled over the outer cross-validation to calculate the classification performance metrics. The pipeline hyperparameters were optimized in an inner leave-one-out cross-validation loop performed over the training compounds. The classification pipeline and cross-validation were implemented using Python 3.10.6, Sklearn 1.0.2, and Numpy 1.2.

Classification Utilizing DE Genes Shared with Tool Compounds
The set of DE genes with a positive FC > 1.5 was determined for each compound using DeSeq2 and FC shrinkage via apeglm [63]. All the DE genes expressed by the FAERSnegative compounds were subtracted from the DE gene list. Then, the number of tool compounds sharing at least one DE gene with the FAERS-positive or FAERS-negative compounds was calculated. Next, the predictive power was evaluated using leave-one-out cross-validation, where each compound in the combined set of FAERS-positive or -negative compounds was left out one-by-one. A logistic regression model with the constant was trained on the number of tool compounds with at least one similar DE gene. The DE analysis and classification were performed using R 4.1.3.

Data Visualization
A dimensionality reduction using UMAP to 2-dimensional space, preceded by a standardization to zero mean and unit variance per feature, was performed for the visualization of the high-dimensional data. For the normalized RNA-seq counts, PCA to 30D space was performed prior to reduction.

Conclusions
Alikeness-% correctly categorized 85%, the GSEA score correctly categorized 73%, and the machine-learning approach correctly categorized 91% of the FAERS-positive compounds with reported seizure liability in clinical use. Several factors can explain the incorrect prediction of seizure liability. These include a low number of induced DE genes, the heterogeneity of the mechanism of action of the compounds, and the heterogeneity of the induced gene expression by different compounds. Only one exposure duration was used, which may not completely capture the full expression profile of the different compounds. In addition, the number of tool compounds was rather small, representing a limited spectrum of ictogenic mechanisms and gene expression patterns for comparison. Furthermore, some seizure liability mechanisms and gene expression patterns remain to be discovered. Finally, the set of test drugs (FAERS-positive compounds) with reported seizure liability were chosen from the FAERS adverse event database [28], leaving some uncertainty regarding whether the ictogenic properties of the compound were related to the condition treated by the drug rather than the drug itself, and consequently, would present a false positive in the FAERS compound category. A larger number of tool compounds with known seizure liability could further improve the sensitivity and specificity of the test platform. Moreover, the expansion of the list of FAERS-negative compounds would help to remove the non-ictogenicity -related "noise" in the DE gene list of tool and test compounds. Overall, gene expression-based analysis shows promise as a novel tool to predict the seizure liability of novel compounds.

Data Availability Statement:
The RNA-Seq data in FASTQ-files generated in this publication have been deposited with links to BioProject accession number PRJNA913400 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject/ accessed on 13 December 2022).